knitr document van Steensel lab

TF reporter cDNA reads processing - Deep P53/GR scan - stimulation 1

Introduction

I previously processed the raw sequencing data, quantified the pDNA data and normalized the cDNA data. In this script, I want to have a detailed look at the cDNA data from a general perspective.

Analysis

First insights into data distribution - reporter activity distribution plots

Explain expression differences betweeen the different affinities

The site closest to the minimal promoter determines the activity. Can neighboring sites even inhibit this effect?

Ridge/Lasso regression

## 
## Call:  glm(formula = log_reporter_activity ~ affinity_pos1 + affinity_pos2 + 
##     affinity_pos3 + affinity_pos4, family = "gaussian", data = cDNA_df_p53)
## 
## Coefficients:
##               (Intercept)  affinity_pos1_1_very-weak  
##                   0.74835                    0.17093  
##      affinity_pos1_2_weak     affinity_pos1_3_medium  
##                   0.14411                    0.03793  
##    affinity_pos1_4_strong  affinity_pos2_1_very-weak  
##                   0.19941                    0.09549  
##      affinity_pos2_2_weak     affinity_pos2_3_medium  
##                  -0.14039                   -0.17676  
##    affinity_pos2_4_strong  affinity_pos3_1_very-weak  
##                  -0.09254                    0.42701  
##      affinity_pos3_2_weak     affinity_pos3_3_medium  
##                   0.25897                   -0.02712  
##    affinity_pos3_4_strong  affinity_pos4_1_very-weak  
##                   0.10787                    0.77533  
##      affinity_pos4_2_weak     affinity_pos4_3_medium  
##                   0.98144                    0.74495  
##    affinity_pos4_4_strong  
##                   0.85772  
## 
## Degrees of Freedom: 1025 Total (i.e. Null);  1009 Residual
## Null Deviance:       405.2 
## Residual Deviance: 258.1     AIC: 1532

Random forest implementation

Session Info

paste("Run time: ",format(Sys.time()-StartTime))
## [1] "Run time:  1.188214 mins"
getwd()
## [1] "/DATA/usr/m.trauernicht/projects/SuRE_deep_scan_trp53_gr/stimulation_1"
date()
## [1] "Thu Nov 26 17:53:15 2020"
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] glmnetUtils_1.1.6   glmnet_4.0-2        Matrix_1.2-18      
##  [4] randomForest_4.6-14 plotly_4.9.2.1      ROCR_1.0-11        
##  [7] tidyr_1.0.0         stringr_1.4.0       readr_1.3.1        
## [10] GGally_1.5.0        gridExtra_2.3       cowplot_1.0.0      
## [13] plyr_1.8.6          viridis_0.5.1       viridisLite_0.3.0  
## [16] ggforce_0.3.1       ggbeeswarm_0.6.0    ggpubr_0.2.5       
## [19] magrittr_1.5        pheatmap_1.0.12     tibble_3.0.1       
## [22] maditr_0.6.3        dplyr_0.8.5         ggplot2_3.3.0      
## [25] RColorBrewer_1.1-2 
## 
## loaded via a namespace (and not attached):
##  [1] httr_1.4.1        jsonlite_1.7.1    splines_3.6.3     foreach_1.4.7    
##  [5] prettydoc_0.4.0   shiny_1.4.0       assertthat_0.2.1  vipor_0.4.5      
##  [9] yaml_2.2.1        pillar_1.4.3      lattice_0.20-38   glue_1.4.2       
## [13] digest_0.6.27     promises_1.1.1    ggsignif_0.6.0    polyclip_1.10-0  
## [17] colorspace_1.4-1  htmltools_0.5.0   httpuv_1.5.4      pkgconfig_2.0.3  
## [21] xtable_1.8-4      purrr_0.3.3       scales_1.1.0      tweenr_1.0.1     
## [25] later_1.1.0.1     mgcv_1.8-31       farver_2.0.1      ellipsis_0.3.0   
## [29] withr_2.1.2       lazyeval_0.2.2    mime_0.9          survival_3.1-8   
## [33] crayon_1.3.4      evaluate_0.14     nlme_3.1-143      MASS_7.3-51.5    
## [37] beeswarm_0.2.3    tools_3.6.3       data.table_1.12.8 hms_0.5.3        
## [41] lifecycle_0.2.0   munsell_0.5.0     compiler_3.6.3    rlang_0.4.8      
## [45] grid_3.6.3        iterators_1.0.12  htmlwidgets_1.5.2 crosstalk_1.0.0  
## [49] labeling_0.3      rmarkdown_2.5     gtable_0.3.0      codetools_0.2-16 
## [53] reshape_0.8.8     R6_2.5.0          knitr_1.30        fastmap_1.0.1    
## [57] shape_1.4.4       stringi_1.5.3     parallel_3.6.3    Rcpp_1.0.5       
## [61] vctrs_0.2.4       tidyselect_1.1.0  xfun_0.19